-
Notifications
You must be signed in to change notification settings - Fork 372
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NodeStageVolume/NodeUnstageVolume #169
NodeStageVolume/NodeUnstageVolume #169
Conversation
e5e86f7
to
2697a9c
Compare
Still taking suggestions on naming. I have nothing against NodePublishDevice/NodeUnpublishDevice, I just think it could be better. I personally like: NodeStageVolume/NodeUnstageVolume. The naming is consistent with current calls as the “Node” part refers to the scope of the call and the “Volume” part follows the existing convention. “Stage” is the action word here that refers to the fact that this call is not finally publishing the volume for usage, but is performing an intermediary staging step to make it available for publishing. I am open for other suggestions on the naming. Other suggestions include: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
completed first pass, left some comments/suggestions. wondering about a possible relationship between these changes and #166 (support for hypervisor runtimes).
spec.md
Outdated
A Node Plugin MUST implement this RPC call if it has `PUBLISH_UNPUBLISH_DEVICE` node capability. | ||
This RPC is called by the CO when a workload that wants to use the specified volume is placed (scheduled) on a node. | ||
The Plugin SHALL assume that this RPC will be executed on the node where the volume will be used. | ||
This RPC MUST be called by the CO a maximum of once per node, per volume. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggest: s/MUST/SHOULD/ -- the RPCs are idempotent, so it's possible that this RPC would be executed more than once.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agreed
csi.proto
Outdated
@@ -325,7 +331,8 @@ message ControllerPublishVolumeRequest { | |||
|
|||
message ControllerPublishVolumeResponse { | |||
// The SP specific information that will be passed to the Plugin in | |||
// the subsequent `NodePublishVolume` call for the given volume. | |||
// the subsequent `NodePublishDevice` and `NodePublishVolume` calls |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: spacing
csi.proto
Outdated
// request. The CO SHALL ensure uniqueness of global_target_path per | ||
// volume. | ||
// This is a REQUIRED field. | ||
string global_target_path = 4; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
naming: if we renamed the RPCs to the "staging" variants (e.g. NodeStageVolume) then maybe we could call this the staging_target_path
? "global" seems a little far-reaching
csi.proto
Outdated
@@ -41,6 +41,12 @@ service Controller { | |||
} | |||
|
|||
service Node { | |||
rpc NodePublishDevice (NodePublishDeviceRequest) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
naming: I prefer NodeStageVolume and NodeUnstageVolume
// The path to which the volume will be published. It MUST be an | ||
// absolute path in the root filesystem of the process serving this | ||
// request. The CO SHALL ensure uniqueness of target_path per volume. | ||
// The CO SHALL ensure that the path exists, and that the process | ||
// serving the request has `read` and `write` permissions to the path. | ||
// This is a REQUIRED field. | ||
string target_path = 4; | ||
string target_path = 5; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we avoid renumbering proto fields now that we have a tagged/published version of the spec in the world?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My understanding that this should be an OK change, it will introduce incompatablity between different CSI spec versions but I think thats OK. Thats why we have a supported_versions field right?
The main issue with changing proto tags is when the protos are actually persisted somewhere and new code with different tags comes along and reads the old protos and gets confused. I don't believe that is an issue here.
@saad-ali may be able to provide more context and a more definitive answer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Client/server mismatch of proto field number can cause issues. However, for v0.2, we should be ok making breaking changes, if needed.
spec.md
Outdated
| Volume does not exist | 5 NOT_FOUND | Indicates that a volume corresponding to the specified `volume_id` does not exist. | Caller MUST verify that the `volume_id` is correct and that the volume is accessible and has not been deleted before retrying with exponential back off. | | ||
| Volume published but is incompatible | 6 ALREADY_EXISTS | Indicates that a volume corresponding to the specified `volume_id` has already been published at the specified `global_target_path` but is incompatible with the specified `volume_capability` flag. | Caller MUST fix the arguments before retying. | | ||
| Operation pending for volume | 10 ABORTED | Indicates that there is a already an operation pending for the specified volume. In general the Cluster Orchestrator (CO) is responsible for ensuring that there is no more than one call "in-flight" per volume at a given time. However, in some circumstances, the CO MAY lose state (for example when the CO crashes and restarts), and MAY issue multiple calls simultaneously for the same volume. The Plugin, SHOULD handle this as gracefully as possible, and MAY return this error code to reject secondary calls. | Caller SHOULD ensure that there are no other calls pending for the specified volume, and then retry with exponential back off. | | ||
| Exceeds capabilities | 10 FAILED_PRECONDITION | Indicates that the CO has exceeded the volume's capabilities because the volume does not have MULTI_NODE capability. | Caller MAY choose to call `ValidateVolumeCapabilities` to validate the volume capabilities, or wait for the volume to be unpublished on the node. | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what's the relationship between supporting these staging calls and SINGLE vs MULTI capability?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these calls will only really be useful for MULTI capability plugins (ref counting moved to CO). There is no restriction on using them for SINGLE but the benefit they provide would just be organizational.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I realize I may have misunderstood your question. This call should be supported for both SINGLE and MULTI capability.
spec.md
Outdated
|
||
The Plugin SHALL assume that this RPC will be executed on the node where the volume is being used. | ||
|
||
This RPC is typically called by the CO when the workload using the volume is being moved to a different node, or all the workload using the volume on a node has finished. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/is typically/MAY be/
s/all the workload/all the workloads/
s/has finished/have finished/
csi.proto
Outdated
// It MUST be set if the Node Plugin implements the | ||
// `PUBLISH_UNPUBLISH_DEVICE` node capability. | ||
// This is an OPTIONAL field. | ||
string global_target_path = 3; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is the staging target path useful here - do we really need it for the use cases we have in mind?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the path that the volume is staged to will likely be a mountpoint, the unstage operation will need to be aware of the location of the mountpoint for this specific volume to "unmount" the staging point.
csi.proto
Outdated
VolumeCapability volume_capability = 5; | ||
} | ||
|
||
message NodePublishDeviceResponse {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
might there be a need for some opaque staging_volume_info
that would be passed downstream to subsequent NodePublishVolume
RPCs?
if we had this, would we need the (global/staging)_target_path as a first class field in NodePublishVolume?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
from what I understand staging_target_path would be the only field inside of staging_volume_info
. I may not be aware of other possible use cases and I wouldn't be against making this more open-ended. Do you know of any other fields in particular that would need to be added?
@@ -1021,6 +1056,104 @@ It is NOT REQUIRED for a controller plugin to implement the `LIST_VOLUMES` capab | |||
|
|||
### Node Service RPC |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might it be worth adding an "RPC Interations" subsection after all of the node service RPCs (like we did for the controller RPCs: https://github.com/container-storage-interface/spec/blob/master/spec.md#rpc-interactions) in order to concisely describe the relationship between some node RPCs?
I'm suggesting this because it would be a convenient place to compare/contrast the staging volume calls from the publish volume calls: staging is once per node/volume vs publish.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
RPC Interactions
NodeStageVolume
, NodeUnstageVolume
, NodePublishVolume
, NodeUnpublishVolume
The following IS REQUIRED if the plugin advertises the STAGE_UNSTAGE_VOLUME
capability.
NodeStageVolume
MUST be called and return success once per volume per node before any NodePublishVolume
MAY be called for the volume.
All NodeUnpublishVolume
MUST be called and return success for a volume before NodeUnstageVolume
MAY be called for the volume.
Suggestions?
Hi @davidz627, Regarding naming, I personally like |
I believe the rationale for adding this as a separate call is so that the Storage Plugin will not have to reference count, this allows Adding this new functionality will push the "decision" to make these staging/unstaging mounts to the CO so that it may call these functionalities in specific and separate RPC's. This frees up |
Hi @davidz627, You know what, I’ve been feeling unwell today and clearly I had a massive brain fart when I suggested that. It’s obviously anathema to the intended behavior :) Mea culpa. |
Regarding naming, I keep coming back to NodePublishDevice, but that ties it to device-backed storage, and honestly, it’s a volume-centric API operation anyway. How about augmenting NodePublishVolume with a “source_path” field that, when specified, indicates that the operation should bind mount “source_path” to “target_path”. Otherwise the volume should be directly mounted to “target_path” per the norm. The NodeUnpublishVolume call would not have to change since it would be called once for each corresponding publication call. |
Thanks for the suggestion Andrew, we've added in staging_target_path in the publish call for the bind mount indication. It is an optional field, so if unspecified, the volume will be directly mounted to target_path. I have also not added it to NodeUnpublishVolume because of the reasoning you mentioned. |
d59d122
to
aaceb16
Compare
xref #166 |
Hi @davidz627, I guess I'm not sure why we need to create a new RPC when the existing ones would work as-is, and the SP could make the decision based on whether or not the |
NodeStageVolume/NodeUnstageVolume are required to setup/tear-down the staging_target_path mount. |
Hi @davidz627, Yes, but the point I'm trying to make is that they do not have to be. If the staging path does not exist the first time a call is made to |
Hi All, FWIW, @davidz627 and I had a conversation offline where I came around to the PR's solution. I was approaching the PR from the point-of-view that an SP must support COs that may or may not implement ref counting. David pointed out that all COs must implement ref counting -- IFF an SP advertises the Sorry for the confusion. I've had a touch of dumbassitis for the last several years and have been slow on the pickup it seems. :) |
53fd3f3
to
44e20d0
Compare
According to the current spec, only volumes with MULTI_NODE_XXX capabilities potentially need this extra RPC (because for SINGLE_NODE_XXX volumes, So for plugins that're capable of dealing with MULTI_NODE volumes, how many of them will benefit with this extra RPC? Can you provide some concrete examples? I imagine that most of the fuse based plugins (MULTI_NODE capable) won't benefit from this extra RPC. I think my main worry about this change is that it introduces two ways for a plugin to implement the same functionality. A plugin can either choose to implement this new RPC, or do ref counting itself. I understand the benefit of this: allowing plugin to be stateless and no need to do ref counting. But isn't that hard? In most of the case I imagine it's just parsing the mount table. One more note: this change is probably related to #166. It'll make supporting #166 even harder. |
@jieyu I know of a few Storage Plugin types that will/may benefit: Yes, the plugin will have a "choice" but if this type of functionality is required it makes sense for them to offload the work to the CO so that the SP can just respond "dumbly" to requests instead of doing its own ref counting and failure recovery. |
@davidz627 As far as I know, |
Many of the listed volumes support multi reader only. |
@jieyu many of the volumes listed support |
@davidz627 I am wondering for |
@jieyu I've done some more research into this. FC, ISCSI, NVMEOE, Lustre, RBD. Potentially Ceph and Rook (@sbezverk may be able to confirm) Most block devices have a non-trivial local attach step. Additionally, Any SINGLE_NODE_ plugin will benefit from if we clarify access modes to allow multiple workloads on a single node access the volume. Currently, that access mode paradigm is the one adopted by Kubernetes and I think it is the correct way to go. Multiple workloads accessing a SINGLE_NODE_ volume on the same node can have several benefits (live backup, migration comes to mind) and not too many drawbacks. |
Hi @davidz627, Oh boy, now we're rehashing #119 and #150. Look, myself and @codenrhoden are both in favor of supporting non If we're reopening this discussion, I'll simply say that I again think the existence of a separate Today there are several CSI plug-ins that implement private mount directories and stage They work great! Don't need separate RPCs either. If we just added the capability to indicate to a CO that it needs to handle ref counting then the above SPs could simply stop handling it inside Anyway, here we go again :) |
Right, lets ignore the SINGLE_NODE_ volumes for now. The new RPC will still benefit in design: The argument is that this new RPC is useful and doesn't just add an unnecessary RPCs. The main benefit (shifting ref-counting responsibility to the CO) is valid for all the above storage plugins. So my argument is that it is useful for a significant number of SPs to have this new RPC. It does not add any additional complexity to existing SP workflows if they choose not to implement this optional RPC. |
Hi @davidz627, To play devil's advocate:
Technically the benefit is derived from the ability of the SP to advertise the capability |
From earlier:
I want to make sure this is seen since no one commented on it. The more I consider this, the more I am confident that RPCs separate from the existing
I agree that the capability |
At one point I think we agreed that we **could** keep the spec as-is, w/o
support for staging RPCs. Doing so pushes the reference counting burden to
the plugin. One of the goals of CSI is to reduce the burden on plugin
writers, if we can. This goal sometimes competes with the goal of
maintaining a set of minimal RPCs.
Somewhere in this discussion there's the mention of a specific use case,
relating to an iSCSI plugin, that would benefit from this design.
I'd like to address the scenario you describe w/ the CO failing to receive
an error, or record an error, from NodeStageVolume: if the RPC errors out,
then the CO could retry because the RPC is idempotent; if the CO fails to
checkpoint the failed RPC attempt and fails over .. then that means it
should have no memory of completing NodeStageVolume successfully and SHOULD
subsequently re-attempt the RPC, right? If a plugin's NodePublishVolume RPC
expects staging_target_path to be set, and it isn't, I'd expect the plugin
to fail with, at least, an "invalid argument" response. Though perhaps we
should tighten up the spec here and specifically name an error code (maybe
a different one?) that clearly indicates that RPCs are probably being
called out of order?
…On Fri, Feb 9, 2018 at 1:49 AM, Schley Andrew Kutz ***@***.*** > wrote:
From earlier:
Hmm. If the consensus in #171
<#171> is that
an SP should be aware of how it is running, then I reiterate my objection
to the need for separate stage RPCs. An SP should know it advertises the
related capability and do the appropriate logic in the existing RPCs. This
is inline with @jdef <https://github.com/jdef>'s remarks about keeping
things tight.
I want to make sure this is seen since no one commented on it. The more I
consider this, the more I am confident that RPCs separate from the existing
NodePublishVolume and NodeUnpublishVolume is not the solution. Some
reasons:
- Multiple CO members have stated that an SP should be aware of how it
is running/configured. Thus a CO knows if it is running with the
STAGE_UNSTAGE_VOLUME capability and thus can handle the appropriate
logic in the NodePublishVolume RPC.
- The spec states that a CO will do its best effort to ensure that
RPCs are called in a serial fashion, but that if a CO or SP process crashes
then it is the responsibility of the SP to handle recovery behaviors. To
that end, it's very possible that a CO sends a NodeStageVolume request
to an SP that fails to complete the request and the CO also fails to
receive the error or record the failed attempt. In this case the CO may
send a NodePublishVolume request expecting the aforementioned
NodeStageVolume was completed successfully. Based on previous
conversations I expect that @saad-ali <https://github.com/saad-ali>
might say the SP should do its best effort to handle this behavior. I
double-checked the PR's changes -- there is no error code defined for
NodePublishVolume to return to a CO that indicates the request's
staging_target_path field is set but it appears NodeStageVolume was
not invoked. Without such an error, it would be easy to infer that the
intent is NodePublishVolume should try to do both the logic for
NodeStageVolume *and* NodePublishVolume. If NodePublishVolume has to
handle both in edge cases anyway, why not just consolidate it?
I agree that the capability STAGE_UNSTAGE_VOLUME is a good idea. I just
don't see the need for the additional RPCs. The spec should remain as
opaque and simple as possible, and CO members seem to agree to that much of
the time. This is one of those cases, an opportunity to either expand the
spec or keep it the way it is with minimal change to achieve the desired
functionality. I do not see a compelling reason why we are not electing to
use the simple, straight-forward, and simple solution now and expanding it
later and only if necessary.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#169 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ACPVLPt5E0Vbky33Jahwya9mNXfIyEpqks5tS-pvgaJpZM4RiShS>
.
|
James, You missed something in my remarks. I said as is with the new STAGE capability. The SP would still be relieved of ref counting, it would just handle the stage logic at the top of the existing RPCs. |
0dffa65
to
a239782
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left a couple of questions.
Thanks!
| Volume does not exist | 5 NOT_FOUND | Indicates that a volume corresponding to the specified `volume_id` does not exist. | Caller MUST verify that the `volume_id` is correct and that the volume is accessible and has not been deleted before retrying with exponential back off. | | ||
| Volume published but is incompatible | 6 ALREADY_EXISTS | Indicates that a volume corresponding to the specified `volume_id` has already been published at the specified `staging_target_path` but is incompatible with the specified `volume_capability` flag. | Caller MUST fix the arguments before retying. | | ||
| Operation pending for volume | 10 ABORTED | Indicates that there is a already an operation pending for the specified volume. In general the Cluster Orchestrator (CO) is responsible for ensuring that there is no more than one call "in-flight" per volume at a given time. However, in some circumstances, the CO MAY lose state (for example when the CO crashes and restarts), and MAY issue multiple calls simultaneously for the same volume. The Plugin, SHOULD handle this as gracefully as possible, and MAY return this error code to reject secondary calls. | Caller SHOULD ensure that there are no other calls pending for the specified volume, and then retry with exponential back off. | | ||
| Exceeds capabilities | 10 FAILED_PRECONDITION | Indicates that the CO has exceeded the volume's capabilities because the volume does not have MULTI_NODE capability. | Caller MAY choose to call `ValidateVolumeCapabilities` to validate the volume capabilities, or wait for the volume to be unpublished on the node. | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In what cases would the node plugin be aware of this state?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Node Plugin should always be aware of its "NodeServiceCapability". The SP is the one that sets the STAGE_UNSTAGE_VOLUME
capability in the first place.
It will receive the request parameters in the request and should be able to check whether staging_target_path is empty/nil.
spec.md
Outdated
// request. The CO SHALL ensure uniqueness of staging_target_path per | ||
// volume. | ||
// This is a REQUIRED field. | ||
string staging_target_path = 4; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I get that this falls in line with the publish RPC, but I'm wondering if this is something that really works here.
Since the CO would not consume this path, I'm not sure it's the CO's responsibility to know anything about it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Discussed this on slack, the description of this is really the path to publish for the node, and NodePublish
's target_path
is really the path for the single workload.
This effectively treats the CO as a storage mechanism which I am ok with for this case.
Perhaps the staging_
should be omitted in the stage RPC?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think having the staging_
makes it especially clear that this is the same path as the one that is used as staging_target_path
in NodePublishVolume
in later calls. Removing it would be naming the same variable differently in different contexts and may introduce some confusion.
The CO MUST guarantee that this RPC is called and returns a success before any `NodePublishVolume` is called for the given volume on the given node. | ||
|
||
This operation MUST be idempotent. | ||
If the volume corresponding to the `volume_id` is already staged to the `staging_target_path`, and is identical to the specified `volume_capability` the Plugin MUST reply `0 OK`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we clarify what to do when volume ID exists with a different staging path?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As per the NodeStageVolumeRequest
description: The CO SHALL ensure uniqueness of staging_target_path per volume.
, so this is the CO's job to ensure. If the SP were to receive non-unique staging_target_path
this would be undefined behavior, and I am ok leaving it as that.
Would be willing to define the behavior if you think that this is not enough for the spec and have a suggestion for what should happen.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this case, uniqueness of the path is not an issue, it's that the SP got a 2nd target path for for a given volume ID.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've clarified the wording to The CO SHALL ensure that there is only one staging_target_path per volume.
hope that helps
a239782
to
f5f3333
Compare
/lgtm Will let @cpuguy83 provide the second approval. |
Hi All, I'm just making one last note that these new RPCs don't make things any easier on SP developers. The The reason is that SP developers should still try to be as exhaustive as they are able when it comes to publication/unpublication on nodes. For example:
Because developers will still need to implement exhaustive logic in both the Node Stage/Pub Unstage/Unpub RPCs, separating them into two sets of RPCs doesn't make anything easier. If anything it makes things more complicated for developers that have to duplicate checks across multiple RPCs. But hey, I guess y'all have already decided that this is what's happening. I just don't understand the desire to add to the spec when opaqueness should be the guiding principal and the existing RPCs in addition to perhaps a new capability and the |
One last thing, I know most of the project's voting members belong to COs. However, I'm a voice of someone working on both the CO side and developing SPs. I would ask that my arguments be given the consideration that dual-focus might justify. Having worked on more than a half dozen CSI SPs so far, I'm saying that this change doesn't make things easier because ultimately you don't want to fail fast -- your goal for an SP is for it to succeed, even when that means doing some exhaustive discovery logic. This change just means that logic is duplicated / chopped up into two locations, making things more complex on SP devs. I hear @saad-ali arguing that this makes things easier on SP devs, but I'm an SP dev, and I'm saying it doesn't. The only members that seem to be arguing for this are CO members. That seems odd to me. I'd just ask that y'all look at CSI-VFS's |
@akutz I don't think the CO's are arguing for this at all. I know this was brought up by the Kube team, but only at the request of some SP's.... so maybe we need to re-sync on this topic. I definitely agree that a good plugin would not rely simply on the CO to ensure these are correct. State will definitely get out of sync. |
Hi Brian, Yeah, this issue was filed by Saad. If SP devs raised it with him, I was unaware. |
Hi Brian, Anyway, my point, as you so succinctly restated it (thank you), is that SPs must, or rather should, be capable of detecting drift and reconciling it to sync state. These additional RPCs simply introduce new locations that must implement nearly identical reconciliation logic, and thus add little value for me, an SP dev. Unless we also add a new RPC that allows a CO to validate its expected state against a Node or Controller, I prefer keeping he number of mutative RPCs to a bare minimum. |
@akutz if It is an optional call that has its use and make development of many SP's easier as outlined above. If it is easier for your SP to implement everything in The arguments you made are all assuming that |
I’m arguing I’ve implemented multiple types of SPs and that my advice that this doesn’t actually make things easier should be given consideration. I know from experience that there is a common enough pattern between SPs that the same reasons I outlined above apply to the majority of SPs and thus this change has minimal actual value and perhaps even the opposite. |
It doesn't have to. Per spec, all an SP has to do is make the specified device available at the specified path. Lots of volume plugins do just this in Kubernetes without extensive additional mount table verification (which is very expensive) and are deployed at scale in production workloads without issues.
Again, this is not true. The CO SHALL ensure that the path exists. And a plugin can simply issue a bind mount. If the bind mount already exists it will simply succeed and be idempotent.
This addition does not (and is not intended) to simplify all plugins. It is an optional addition designed to make a certain class of plugins, those that have a local attach/detach step, easier to implement. It should go without saying, you should not implement it if it doesn't make sense for your plugin or complicates the logic (as it does in your case). Some plugins may choose to implement more complicated logic, but if we can lower the bar for writing a class of plugins, that will be beneficial to the ecosystem as a whole. I believe @davidz627 has diligently addressed the feedback so far. And this PR is close to done. @cpuguy83 do you have any other concrete concerns? @davidz627 could you please rebase the PR. |
@saad-ali I think the argument is that this makes the happy case simpler but does not help at all in the case that there are problems. Do we have some concrete examples of a SP that cannot work without this? |
f5f3333
to
58316d1
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I no longer really have any dog in this fight, and while I stand by what I've said, I agree with @davidz627 that this is optional and doesn't harm SPs that don't wish to implement it. I just worry that the bigger picture with regards to state won't be as apparent to new SP devs and these new RPCs could be tricky to implement correctly. Either way, I'm going to approve the PR and wish y'all the best of luck!
@cpuguy83 the purpose of this call was always to lower the bar for writing a plugin, not to fix an blocking issue. Volume plugins can implement this logic in the existing I disagree that plugins that will use this will open themselves up for errors. If this is a concern, let's be more concrete about the error cases that would be more common. It is important to not that this is a model that many volume plugins already follow in Kubernetes without issue. "A workload referencing a specified volume is scheduled to a node for the first time" and "the last workload referencing a specified volume is removed from a node" are important events and plugins should be able to hook in to they if they choose. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Needs a rebase.
This patch handles issue container-storage-interface#119 by adding two new RPCs, "NodePublishDevice" and "NodeUnpublishDevice". These RPCs MUST be called by the CO if the Node Plugin advertises the "PUBLISH_UNPUBLISH_DEVICE" capability. Plugins that advertise this capability SHOULD defer volume reference counting to the CO.
…deStageVolume/NodeUnstageVolume. Addressed jdef comments. Added RPC interactions and reference counting section NodeStageVolume modifications: added publish_volume_info to response. Changed credential naming. Added credentials to unstage.
58316d1
to
068ae27
Compare
Rebased. Thanks everyone for the thorough reviews and questions! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Fixes Issue: #119
Adds two new RPCs, NodeStageVolume/NodeUnstageVolume. These RPCs MUST be called by the CO if the Node Plugin advertises the STAGE_UNSTAGE_VOLUME capability. Plugins that advertise this capability SHOULD defer volume reference counting to the CO.
This PR supersedes #122, @akutz gets credit for the majority of the code, I'm just picking up where he left off and pushing this through.
cc @saad-ali @jieyu @clintkitson @jdef